Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
debakarr
GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 2 - Regression/Simple Linear Regression/[R] Simple Linear Regression.ipynb
1009 views
Kernel: R
library("IRdisplay")
display_png(file="img/01.png")
Image in a Jupyter notebook

  • b0 is constant representing the base salary of anyone who come to profession and have no experience i.e. Experience = 0

  • b1 is coefficient representing the slope. The more experience the more raise will be their in salary.

Here in the graph, the black line is Best Fitting Line


display_png(file="img/02.png")
Image in a Jupyter notebook

Actual value vs Model value and Ordinary Least Square

display_png(file="img/03.png")
Image in a Jupyter notebook

Data Preprocessing

# Importing the dataset dataset = read.csv('Salary_Data.csv') # Splitting the dataset into the Training set and Test set # install.packages('caTools') library(caTools) set.seed(123) split = sample.split(dataset$Salary, SplitRatio = 0.75) training_set = subset(dataset, split == TRUE) test_set = subset(dataset, split == FALSE) # Feature Scaling # training_set = scale(training_set) # test_set = scale(test_set)
training_set
test_set

Fitting Simple Linear Regression to the Training Set

regressor = lm(formula = Salary ~ YearsExperience, data = training_set)
summary(regressor)
Call: lm(formula = Salary ~ YearsExperience, data = training_set) Residuals: Min 1Q Median 3Q Max -7853.2 -3691.2 904.8 3191.0 8080.8 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 27232.5 2474.3 11.01 6.17e-10 *** YearsExperience 9103.7 392.9 23.17 6.38e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 5471 on 20 degrees of freedom Multiple R-squared: 0.9641, Adjusted R-squared: 0.9623 F-statistic: 537 on 1 and 20 DF, p-value: 6.382e-16

The smaller the p-value the more significant is the Independent variable on the formula of dependent variable.

Watch this video for more information on p-value


Predicting the Test set result

y_pred = predict(regressor, newdata = test_set)
y_pred

Visualising the Training set results

  • X = Years of Experience

  • Y = Salary

# install.packages('ggplot2') # package to plot graph library(ggplot2)
## geom for geometrical ggplot() + geom_point(aes(x = training_set$YearsExperience, y = training_set$Salary), colour = 'red') + geom_line(aes(x = training_set$YearsExperience, y = predict(regressor, newdata = training_set)), colour = 'green') + ggtitle('Salary vs Experience (Training Set)') + xlab('Years of Experience') + ylab('Salary')
MIME type unknown not supported
Image in a Jupyter notebook

Visualising the Test set results

ggplot() + geom_point(aes(x = test_set$YearsExperience, y = test_set$Salary), colour = 'red') + geom_line(aes(x = training_set$YearsExperience, y= predict(regressor, newdata = training_set)), colour = 'green') + ggtitle('Salary vs Experience (Test Set)') + xlab('Years of Experience') + ylab('Salary')
MIME type unknown not supported
Image in a Jupyter notebook